Setup
wd <- "C:\\Users\\heinr\\OneDrive\\Desktop\\LARGE DATA\\LinkedIn\\Write-up"
opts_knit$set(root.dir = wd)
library(knitr)
Preview Data
Individual profile descriptions
wd <- "C:\\Users\\heinr\\OneDrive\\Desktop\\LARGE DATA\\LinkedIn\\Write-up"
opts_knit$set(root.dir = wd)
Individual descriptions are in company files… What about duplicates?
Individual employment duration by company
read.csv(file = '../company_level_individual_profiles/24-hour-fitness_person_profile_50_word_cutoff.csv')
These are in separate files divided by company
LIWC scores
read.csv(file = 'company_level_info_with_grouped_sector.csv')
NA
I do not think I have the x files … x = individuals y = companies
Company-individual LIWC similarity scores
Again, I do not think I have the x files … x = individuals y = companies
Company LIWC scores
read.csv(file = 'LIWC_idividual_company_mapped_similarity.csv', nrows = 1000)
NA
Individual LIWC scores
Cant find the individual files Why are these in three dfs? Size?
Merged emplyment duration and distance metrics
Probably makes sense to split indices … Are person ids really unique?
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2siDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KIA0KIyBTZXR1cCANCg0KYGBge3IsIHNldHVwfQ0Kd2QgPC0gIkM6XFxVc2Vyc1xcaGVpbnJcXE9uZURyaXZlXFxEZXNrdG9wXFxMQVJHRSBEQVRBXFxMaW5rZWRJblxcV3JpdGUtdXAiDQpvcHRzX2tuaXQkc2V0KHJvb3QuZGlyID0gd2QpDQoNCmxpYnJhcnkoa25pdHIpDQoNCg0KYGBgDQoNCiMgUHJldmlldyBEYXRhDQoNCiMjIyBJbmRpdmlkdWFsIHByb2ZpbGUgZGVzY3JpcHRpb25zDQpgYGB7cn0NCnJlYWQuY3N2KGZpbGUgPSAnLi4vY29tcGFueV9sZXZlbF9pbmRpdmlkdWFsX3Byb2ZpbGVzLzI0LWhvdXItZml0bmVzc19wZXJzb25fcHJvZmlsZV81MF93b3JkX2N1dG9mZi5jc3YnKQ0KYGBgDQpJbmRpdmlkdWFsIGRlc2NyaXB0aW9ucyBhcmUgaW4gY29tcGFueSBmaWxlcy4uLiBXaGF0IGFib3V0IGR1cGxpY2F0ZXM/DQoNCg0KIyMjIEluZGl2aWR1YWwgZW1wbG95bWVudCBkdXJhdGlvbiBieSBjb21wYW55DQpgYGB7cn0NCnJlYWQuY3N2KGZpbGUgPSAnLi4vY29tcGFueV9sZXZlbF9pbmRpdmlkdWFsX3N0YXlfdGVybS8yNC1ob3VyLWZpdG5lc3NfcGVyc29uX3N0YXlfdGVybS5jc3YnKQ0KYGBgDQpUaGVzZSBhcmUgaW4gc2VwYXJhdGUgZmlsZXMgZGl2aWRlZCBieSBjb21wYW55DQoNCiMjIyBDb21wYW55IG1ldGFkYXRhDQpgYGB7cn0NCnJlYWQuY3N2KGZpbGUgPSAnY29tcGFueV9sZXZlbF9pbmZvX3dpdGhfZ3JvdXBlZF9zZWN0b3IuY3N2JykNCmBgYA0KRXZlcnl0aGluZyBpbiBvbmUgZmlsZSANCg0KIyMjIExJV0Mgc2NvcmVzIA0KYGBge3J9DQpyZWFkLmNzdihmaWxlID0gJ0xJV0NfaWRpdmlkdWFsX2NvbXBhbnlfbWFwcGVkLmNzdicsIG5yb3dzID0gMTAwMCkNCmBgYA0KSSBkbyBub3QgdGhpbmsgSSBoYXZlIHRoZSB4IGZpbGVzIC4uLiANCnggPSBpbmRpdmlkdWFscyANCnkgPSBjb21wYW5pZXMNCg0KDQojIyMgQ29tcGFueS1pbmRpdmlkdWFsIExJV0Mgc2ltaWxhcml0eSBzY29yZXMNCmBgYHtyfQ0KcmVhZC5jc3YoZmlsZSA9ICdMSVdDX2lkaXZpZHVhbF9jb21wYW55X21hcHBlZF9zaW1pbGFyaXR5LmNzdicsIG5yb3dzID0gMTAwMCkNCmBgYA0KQWdhaW4sIEkgZG8gbm90IHRoaW5rIEkgaGF2ZSB0aGUgeCBmaWxlcyAuLi4gDQp4ID0gaW5kaXZpZHVhbHMgDQp5ID0gY29tcGFuaWVzDQoNCiMjIyBDb21wYW55IExJV0Mgc2NvcmVzIA0KYGBge3J9DQpyZWFkLmNzdihmaWxlID0gJ0xJV0MyMDE1X0NvbXBhbnkuY3N2JywgbnJvd3MgPSAxMDAwKQ0KYGBgDQoNCiMjIyBJbmRpdmlkdWFsIExJV0Mgc2NvcmVzIA0KYGBge3J9DQpyZWFkLmNzdihmaWxlID0gJ0xJV0MyMDE1X0luZGl2aWR1YWwxLmNzdicsIG5yb3dzID0gMTAwMCkNCmBgYA0KDQoNCmBgYHtyfQ0KcmVhZC5jc3YoZmlsZSA9ICdMSVdDMjAxNV9JbmRpdmlkdWFsMi5jc3YnLCBucm93cyA9IDEwMDApDQpgYGANCg0KYGBge3J9DQpyZWFkLmNzdihmaWxlID0gJ0xJV0MyMDE1X0luZGl2aWR1YWwzLmNzdicsIG5yb3dzID0gMTAwMCkNCmBgYA0KQ2FudCBmaW5kIHRoZSBpbmRpdmlkdWFsIGZpbGVzIA0KV2h5IGFyZSB0aGVzZSBpbiB0aHJlZSBkZnM/IFNpemU/DQoNCg0KIyMjIE1lcmdlZCBlbXBseW1lbnQgZHVyYXRpb24gYW5kIGRpc3RhbmNlIG1ldHJpY3MNCmBgYHtyfQ0KcmVhZC5jc3YoZmlsZSA9ICdzdGF5X3Rlcm1fYW5kX2Rpc3RhbmNlcy5jc3YnLCBucm93cyA9IDEwMDApDQpgYGANClByb2JhYmx5IG1ha2VzIHNlbnNlIHRvIHNwbGl0IGluZGljZXMgLi4uIA0KQXJlIHBlcnNvbiBpZHMgcmVhbGx5IHVuaXF1ZT8gDQo=